text term
Learning Inter-Related Statistical Query Translation Models for English-Chinese Bi-Directional CLIR
Zhang, Yuejie (Fudan University) | Cen, Lei (Fudan University) | Jin, Cheng (Fudan University) | Xue, Xiangyang (Fudan University) | Fan, Jianping (The University of North Carolina at Charlotte)
To support more precise query translation for English-Chinese Bi-Directional Cross-Language Information Retrieval (CLIR), we have developed a novel framework by integrating a semantic network to characterize the correlations between multiple inter-related text terms of interest and learn their inter-related statistical query translation models. First, a semantic network is automatically generated from large-scale English-Chinese bilingual parallel corpora to characterize the correlations between a large number of text terms of interest. Second, the semantic network is exploited to learn the statistical query translation models for such text terms of interest. Finally, these inter-related query translation models are used to translate the queries more precisely and achieve more effective CLIR. Our experiments on a large number of official public data have obtained very positive results.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > North Carolina (0.04)
- Asia > China > Hong Kong (0.04)
Large-Scale Community Detection on YouTube for Topic Discovery and Exploration
Gargi, Ullas (Google, Inc.) | Lu, Wenjun (University of Maryland) | Mirrokni, Vahab (Google, Inc.) | Yoon, Sangho (Google, Inc.)
Detecting coherent, well-connected communities in large graphs provides insight into the graph structure and can serve as the basis for content discovery. Clustering is a popular technique for community detection but global algorithms that examine the entire graph do not scale. Local algorithms are highly parallelizable but perform sub-optimally, especially in applications where we need to optimize multiple metrics. We present a multi-stage algorithm based on local-clustering that is highly scalable, combining a pre-processing stage, a lo- cal clustering stage, and a post-processing stage. We apply it to the YouTube video graph to generate named clusters of videos with coherent content. We formalize coverage, co- herence, and connectivity metrics and evaluate the quality of the algorithm for large YouTube graphs. Our use of local algorithms for global clustering, and its implementation and practical evaluation on such a large scale is a first of its kind.
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)